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-4- ' Virtually all social science students who have studied applied 

... 

». stati'stics have been introduced to the concepts and formulas for» linear 



correlation of two variables. Applied statistics textbooks Routinely 
report the theoretical limits of the bivariate .correlation coefficient; tiamely, 
•that the coefficient is no' more than +1 and no less than -1^ However, 
no commonly used applied statistics textbook proves this. One of the *" 
best textbooks available to studeats of education and psychology introduces 
the* proof (Glass and Stanley, 1970). Undoubtedly, one of the constraints 
placed on authors by publishers is space limitations available for detailed 
explanations, derivations and proofs l . * \ 

€ 

* ^ , This paper will set forth in "detail a proof of the limits of the sample 

/ . 

bivariate correlation coefficient. Since the proof requires only Knowledge of 
algebra, most students of applied statistics at the advanced undergraduate * 
^ or introductory graduate* level should have little difficulty in under- 
standing e the proof. As a former instructor of graduate level introductory 
* applied statistics,' I know that the typical student can understand the 
proof as it is presented here. 

The key for Understanding statistical proofs is a presentation 

• • of detailed steps in ~h we'll articulated and coherent manner. A review 

> '• N v • . ' ' 

^ of relevant statistical and mathematical concepts is also helpful ( and s 
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usually required). When students are presented in derail im- 
portant statistical proof s ; they feel that some of the mystery and magic 
of mathematics Jias been unveiled. My experience has been that the typical 
•student of applied statistics can follow a good number of proofs bfeccfuse 

> • . • ] • 

most proofs can be presented algebraically without use of calculus. In addition to 



enhancing knowledge, an occasional proof often ^increases academic 

\ 1 

motivation: • - j 



« Some Preliminary Concepts * . 

The proof requires knowledge of several concepts in statistics 
and mathematics. In order to make this -paper self-contained, some t / < 

preliminary concepts sta'ted in a consistent notation will be reviewed. 
We will review the congepts and formulas of standard' scores (z scores) , 
bivariate, correlation formulas in unstandardized and standardized form, 
and algebraic inequalities. * * 

Notation and Basic Formulas ' % . 

Table 1' i^, a layout of symbolic values written in the notation 

to" be used in this paper. The model presented in Table 1 is of two measures 

* * 
in unstandardized (raw score)' and standardized (z score) form. Table 2 

• • * * 

.presents some familiar formulas based on unstandardized variables that 

4 will be useful' for the 'development of the proof . 



1- 

This paper is one of a series contemplated . for publication. [See • 
O'Brien", 1982], Eventually I hope ,to present a textbook of applied statis-* 
tics proofs and derivations to supplement standard applied statistics ' \ 
textbooks. ' • •• ., ^\ 
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Table 1 * ' - 1 ' 

Table Layout for Two Measures in Unstandardi2ed and Standardized Form 



.0 



Measure X 



Measure Y 



Unstandardiz^d 'Standardized 



Unstandardized Standardized 



(X -X)/S = z 



[x-xy/s- = z 

2 <x x. 



1 » 



(X -X)/S =? z 
3 x x. 



(Y -Y)/S = Z 

i y 



Y o h (Y -Y)/S = 'Z 



(Y -Y)/S = Z 
- 3 y *3 



x.-x /s = z 

1 x x . 

' 1 



(Y.-YJ/S = Z 

i y Y . 



(x -X)/S 
n x 



= *Z ^ - ' Y 
. x m ~ n 

n 



(Y -Y)/S = Z 

n y y„ 



Sample 
Size 



y , 



Sample 
Mean 



Sample 
Variance 



J* 



y 

2 



NOTE: all sample size terms are equaj.; th^t is: n n = *i ■ = n • , 

• • . x z m y z 

, • - ■ x y . • • 

Any of these sample si^e terms could be identified by^jusl: <3ne symbol — such as. 

n. We will use ft when it is not important to distinguish. among the other sample' 

size terms, but will use the table' values above when it» is necessary or* important • 

to do so. + . . * 



Table 2 ^ » 

Relevant Formulas for Unstandardized Measures 



Measure X 



Measure 



Sample , 
Mean 



X = 



. i x. 
1=1 1 



Y = 



n 

y 

n 

y 



Sum 



n X 
x 



= r*. 



i=l 



n Y 

y 



n 

y 

fe 1 



Sample 
Variance 



s 2 

* X 



i=l 



n -1 ; 
x 




Sum of 
Squares 



A 

(n -1)S.. = / 



i=l 



a. 



(n -1)S 

y y 



i=l 



NOTES: 



!um 



1, The sample^size terms are equal: *n=n. Alsc^n =n 

j * * . x y a y 

2, „"Sum"is simply *an algebraic manipulation of "Sample MeanV; i.e., 

multiply over the sample size'term in "Sample Mean" to get*"S"- 
^lso/'Sum of Squares"' is such a manipulation -based on ^ample 
Variance".. "Sum" and "Sum of Squares" 'will be useful later on 

3* Descriptive statistics for standardized scores 'will be % 
* developed in . the body \o£ the text;. - . 



S tandard Scores m ' * 4 * * • 
; * \ f v ~ 

9 It will be Recalled ^hat the -standard score for an unstandardiz^ed 
measure (raw score) is "the score *mirius the mean divided by the standard 

' • V .• ' 

deviation". For case 1 of measure X in* Table 1 , 'the standard Jz) score is: 



X^-X 

1 X 



For any (hypothetical*), case, 'the standard, score .of ' an X measure isr ^ 



> X.-X 

z" 



x. ' . S B 

1 x . 



,The ^ame procedure can be applied to Y , measures. ^ For case'l: 



V Y 



J l y 



* 1 * * * ¥ 

Similarly, for the ith (hypothetical ca<se) , *we have: 



Y*-Y 
i 



y. S 

J i . y 



Since a standard scoje distribution (such as in Table V) is a 
distribution of variable measures., ,we can calculate means/ standard deviations 
variances, correlations, and so forth, just as we can s calculate these • 
statistic^ for tinstandardized measures. * 



MosV students will recall that the mean of z scores is equal to 0 

and ,the standard deviation (an£ variance) of standardized scores is equal to 1 

1 " •„ ' 

(The proof of these statements is given in the Appendix .) ■ 

* The mean of X standardized scores is defined as: v 



9 n 

- . 2 

x 

£ v ■ 

i=l X l 

z = = 0 

x n 

z 

x 



9 ' •^pl^ 

Similarly, for Y measures: 



z 



n* 
% z 

■ y 

H i 

i=l .^i 



y 



The variance c5f X in z score notation is defined as: 



n 



Z(Z -2 
V 



x 2 

-Z ) • 

V = 1=1 Xi x i • - 

e S z n -1 . 

v . A * * » v 



^Th$ Appendix" contains proof of certain- concepts % or relationships 
' " * that m^y be of interest to the reader but -are not crucial for ^the, development 
. .of the proof an this gaper (the theoretical limits of the sample bivariate * 
"correlation coefficient) . 

ERLC # / 7 . - ' 9 > ' 

J^^J f it * w 



.1 • v 



, . •. . • / 

For the standardized,? measure the variance is defined as; 



n • 
z 

y 



51 (z -z r 

ifi y i y 



n -1 
z 

y 



Sum of Squared Standard Scores . 

To understand the proof it is nrecessary to know the result 

of summing a distribution of squared standard scores. If we square 

* « 

each, standard »score for the X measure in Table 1 and sum them, we obtain: 



* 1=1 l 



2 2 2 

z + z + ...+• z. 

.1 " -2 * n 



*If we substitute "the appropriate means and variances in .the right hand side 
of the expression^ we obtain: 



(X -X) 



.1=1 i 



(X -x). 
— =- — + 



(X -X) 
n 



2 ' 



* v 
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Since the* S is a constant, we can. factor it outside, and write: 

, x t 



X 



(x -x) 2 + (x -xr 2 +...+ (X 
J- i " ^ n 



Rewriting the right hand -side in summation notation, we obtain: 



i=l 



n 



x - 2 

(X j .-X) 



I 



x* 



From Table 2, we know th&t we can substitute the sum of squares term 
into the numerator on 'the- right hand side. This results in: 



x 

Zz 2 



(n il)s 
x x 



n -1 = n-1 
x 



(Recall that n -1 = n-1). 

' X 



If we were to work through the same steps for Y, we would obtain' 



2>* 



(n -1)S 2 ' 
Y Y 



= --n -1 

. -Y 



= n-1' 



(Recall that n -1 = n-1). • 

. y 



- J 



The*e relationships between squared- z scores and sample si-ze are very important 
for the proof -later on. They will be summarized JLater on for easy reference. . 



Correlation Forjamlas 



Uristandarflized Form 



• * - Using the notation 'and variables in Tables 1 and 2, the.unstand^rdized 
'form. correlation for two measures (Xyknd Y) is defined as follows: " • 



♦ - ► 



xy 



(X.-X) (Y.-Y) 



n-1 i=l 




Note that the numerator contains the term n-1 because it is not important 

or necessary to distinguish between n^lor n-1. However, in the denoro-. 

inator it is helpful to distingyish n-1 from n-1. In any case, all 
• * • y . 

of the sample size terms would be equal^ to the same numerical value if 'a 
^p-ewx^lation coefficient were computed on a set of data^ (n^l = n^-1 =n-l) 



Standardized Form 



The correlation of measure X *and measure V in standard score form 



is defined as follows: 



Z £ = 

x y 



i y~ (z -z ) (z -z ) . ' 

— x . x y . y F ✓ 

n-1 i=l l J i» 



Y~(Z -Z ) 



i=l< 



n -1 
zr 
x • 




It is proven in the Appendix that this correlation formula is eqtfal to: 




If we rearrange ^this formula^/tfy multiplying ove'r th£ n-1 term, we obtain: 



-n 



(n-Dr 



z Z x -* 

x y 



i=l 



i y i 



TJiis relationship will be useful in „ the proofs it will be restated for 
easy reference Later. ^ • «• • 

The reader may -recall, that the same correlation coefficient results 
when the variables are in raw score form or standard score" form. That is:' 



This statement is proven in the Appendix. Jfe£will restate it prior to the 
proof for the. readers convenience. * ' * -J 

* - 9 s a • 

* • o 

' ' . . , • <~ . 

♦ Inequalit4es - * ' 



Before starting the pr6oT it is necessary 'to revie\ one further • 
topic: algebraic inequalities. In the -proof we are required to manip- 
ul§te 'bjie form of inequ^li^y: the form "greater than- or equal to" ^and 
^'less than, or equa^'to" . ,An example will serve as a refresher. ^ 

For two variables, say A and B, we can* write: 

A _> *B' . t , 

which means "A is 'greater than or equa^ to B". 

Equivalently , we can wrrte^ ^ * _ * % /, 



B < A 



{ 



I 



which means the same thing: "b' is less than or. equal to A n .^ *For example,, y 

3 ^ 1 Qr equivalently 1 C 3. 
All 'of this be obvious. What students sometimes forget j.s 

what happens when. multiplyiiWf or dividing by negative quantities. For 

. > • " •> . 

example/ if 3^ \* a * d we multiply this inequality by -1, we would obtain: 
\ -i[(3) > (l)] = -3 X T 1 . 'f 

z v N 



12 



I 



'. . .12. 

/ - 

J 

That is, the inequality sign is reversed, when multiplying by a negative number * 

~ ♦ ; ^ 1 T 

The same result occurs for more cbmplex expressions. For example: 

* * * v 

' 1 - A > B 



Multiplying each side of the inequality by -1, we get:; 



-1[(1-A) _> (B)] = -(1-A) '<_ B or A-l <_ B 



Example: 



1 * k > a 



•Multiplying through by -1, we obtain: 



i 

-im-k) 2_ (0) ] = -a-h) C_ o hj} £. o 



Y 

Summary^of Important Concepts ; ^ * 

We have reviewed standard scores (z) , correlation formulas and 

algebrait inequalities. * All of these concepts are important to understand 
, x jr * . 

*the proof -that follows. For the readers convenience, we- will summarize 

these concepts for easy reference. This is done in Table 3. s 



Table 3 

/ Summary of Important Concepts 



n n 
z ■ z 



2Z £ = VZ = n-l 

x. y. ^ 

i=l i=l 



m r =? r 

xy . . z z 



13,/. 



x y , * 



n 



\ 



T" 2 Z (n-l)r ' = (n-l)r ' • \ 

r: x.«y. z z xy \ 

i=l l i x y 



-lf(l-A) > (B)] = A-f < B 



9 
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Proof 



We are now ready to present the proof. Formally,, we want to prove 
the following statements: . - * 



r > -1 

xy — 

r • < +1 

xy — 



writing each of the statements in one linear form: 



' -1 C r +1 
4 ^> xy o 

» £ 

This states the same *inf ormation as the above two separate statements } % 



The proof consists of two parts: one part shows the lower limit 

of r (i.e., r ^ -1), and the second 'part shows the. upper limit 
xy xy — 

of * r (i e., r' X ^ +1) . We will prove the upper limit first, 
xy xy — ; • 



Proof that r < +1 



To prove this limit', we will perform algebraic manipulations 
on a statement which is 'mathematically true. That statement is: . 



i-1 x < 



15 



15. 



In^wdrds, the statement means:*" the sum of squared differences of n 
standardized value pairs will always be equ&l'to or greater than 0. The 
reader may refer to^Table 1 for clarification. The squared differences 
are taken fc|r each row (pairs) of and 2^ values starting at 

Z x 1 , Z y i and contSming down D t o the last pair of Z's (Z ,Z ). "* <N,,,% *' 



n n 



tfost students readily agree that the squared sum will be greater than 0 
» » 
But can it ever be exaptly equal to 0? Yes,, theoretically it can. Ref$r- 

ring to Table 1, if phel imagines each standardized X and Y measure to have 

the same numerical value, then it is apparent that each difference will be 

0; so, the squared value of 0 is also 0. Now, a sum of squared 0*s* 

will itself be equal to*0. While it may be unlikely to occur in practice, 

it is only* required that 2~ - (\ -Z ) > 0 be true in a mathematical 

1=1 * i ■ J i ^ — 

s^nse. Thus, the statement is true. We will expand this squared. sum, 
perform algebraic manipulations and substitutions, and arrive at the proof 
for the upper limit of the*sample correlation coefficient. 

The actual steps in the derivation will now be presented. Notes 



.pertaining to the algebra "are provided for the readers reference. Refer, 
to Tables 1,2 and 3 as needed. It is suggested that the reader first 
examine the algebraic statement on the left side of the page. Then read the 
comment on the right side for explanation. See next page. y 

- 1 That is, within pairs, not all pairs. Example: 

-^r^^ ^fx^ ■ Z y^ 



• 1.41^ 1.41 

-.6 en** -.68 

.05 .05 
etc. 




0 



Notes 



A^restatement frpm before. Squaring 
each te*m, we obtain an expansion of 
the binomial in this form: 



(A-B-) 2 = A 2 + B 2 - 2AB 



Distributing the summatiorf operator' Ijl 
^ . to each term, ancl bringing * the 

' constant \2f outside the summation sign 



This next ste"fc> is very important. We will 
substitute three quantities, aj.1 f ro*m 
Table 3. They are: 



£z 2 = n-l • - 

* — v 



X 

i-1 1 



xSz = n-l 
l 



i=l 



n 



Z = (n-l)r „ = (n-l) 
y. z z 

i i x y 



i=l 




(n-1) + (n-1) 



2(n-i)r J> 0 
•xy — 



2 (.n-1) 



2.(n-l)r > 0 



xy — 



2(n-l)[lrr ] > 0 
xy — 



2(n-I|[l-r xy ] ^ 



2(n-l)-' 



0 



2(n-l) 



13 



(1-r ) > 
xy — 



-1[ (1-r ) > 0] = r -1 <C 0 
xy — xy — 



r -1 + 1 

xy 



r < +1 

xy — 



X o + l 



Making the 
Collecting 



Factoring the 



e three substitutions 
the l^ke^terms of (nvl) 



2 (n-1) a term 



Dividing ecchjside of the Inequality by 
2 (n-1) which c oes not change the in-< 
equality s:.gn as 2(,n-l)- is always 
positive b<sca.ise n raufet ^always be ^ 2 



inequality" b; 
us multiply i 



Here we make' use of multiplying an 



a negative number. Let 
<ach side of the inequality 
by -1. (see Taole 3) which reverses 
the inequality sign and reverses the 1-r 



Now, add 4-1 



This gives us 



END OF PROOF 



xy 



to each side 



?0R. UPPER LIMIT. 



, # Part t^wo of the proof will be much simpler because t)ne structure 

» * 
of this— part of the proof is very much like the fir3t part-. We will ^ 

follow the same basic steps. We start out; with a statement that i*s mathe- 

mktic^lly true, namely:* ' - 



n 



Jjz + " z ) 2 y 

x. y. f 



y . / 0 



Again this statement is true in a mathematical sense* even through the 
"equals 0" aspect is very unlikely to occur in statistical, practice. J 

The development of the proof with appropriate notes "begins on the 



1 

next page.. ^* 



x,. y. 



& ' 1=1 



/ 1 



2> 

i=l 



X . 
1 



+ * Z + 




c 



2Z Z _ ) > tf 

i ' 
/ 



x. y. 
l i 



Step 3? restated. Squaring each term 
results in a binominal expansion in 
this form: * / 



2 2 2 

(A+B) = A + B + 2AB 



Distributing the summation operator ^ 
and bringing out the 2 v - 



<ERIC 



i=l 



tnrl),, + 



2 T;Z ?z . ^ , 0 



x. y. 
i=l / 



(n-1) 



i+ 2(n-l)r - > , 0 
H «■ xy — - 



2(n-l) [1 + r ] > 0 
xy — 



1 + r 



xy 



1. + r 



> 0 



-1 > 0 



xy 



xy f 

> -i 



22 



Making* the same three substitutions 
ks in part one. we obtain*** \ 



Adding like terms and factoring 

• « * *■ 

Dividing each sijde by 2Jri-l) 
* 

Adding- -l^to each side 
Simplifying 



END OF PROOF FDR LOWER LIMIT 

* \9 



23 



« \ 20. 



We have just proven that - -I z +1. See the Appendi 



xy 



for additional proofs of related material, 



V 




/ 



21. 
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APPENDIX 



Selected* Proofs" 



1; That the mean of standard scores is equal to 0- 



We will start with the definition of the meran of z scores for 



the X measure.- * 



. n 



* i 



i=l 



i.' 



Expanding the right side: 



(X -X) 



(x 2 -x, 



IX_=X*- 

n * 
+ 



Factoring the constant, , 

in summation notation^ 

A 



outside and* rewriting the sum of deviations 



x i=l 



25 



0 • 



r 



Distributing the summation sign inside the parentheses r : 



n . S 
z * x 



n 



Since 



i=l 



X, 4 X _ * * ^ 

2Z* X « 1 = n x and v^ e sum °^ constant ' 21 X ; is equal to*n -X , 



V 



1 ' 1 



n * S 
z x 

X 



(n X - n X) 
x x 



Thus the* mean of Z scores is eq^al to 0. , Similar reasoning £or the 

" r 

Y measure will -produce the same result, namely i r 



Z o 

y 



n S 
z y 



1 (n y - n YJ = <0 

— y y ^ 



Therefore, variables in standardized form have mean equal to 0. 



» » 



Recall that when taking the sum of a constant (say C) ; we have: 



^ j[c = C *+ C + C +. . .+ C = . nC 

That is, the sum of ^^foristant is equal to the constant times the numfcer 

""of terms added (in thi^/case n) . % . 

* • — 



23. 



2. - 'That the variance and standard deviation of, standard scores is equal to 1. 



By. definition, the variance for X measures in standard score form is: 



-i, 2 . 

•H- X 



n -1 



z 

X 



Since we know that Z = ;0, we now have: 

x 



i=l 



If we rewrite Z 
deviation: 



X. 

1 



in terms of -unstandardized mean and standard 



2 x = n i-l^S, .J 



Rearranging terms: f \ 

* , » 

n 

x 

e 2-' ; 1 1 . V*(X.-X) 

2' n -1 2 . , 



n.2 



x 2 S i=l 

X X * 



p.' 



> ft » 



From Table 3, we can substitute* into the numerator of S the "sum of 

z 

X 

squares'* for the X measure. This results in: 



1 1 ;(n -1) (S ) 



£>ince n = n we can cancel terms, leaving: 
x z . 
x 



1, 



Similar reasoning for Y standardized measures* will produce, as the next 
to the last step in the derivation: 



n -1 2 
z S 



1 (n -1) (S ) 

^— y -y 



Since n = n 
y z 

• . y - 



In each case, the standard deviation for appropriate variance terms 
is simply the square root of 1* That is; 




s = s 

z z 

X - X 



-f- 



= 1 and 



S . = S. 
z z 

y r i 



1 = 1 



Thus-, the variance and standard deviation" of z scores .is equal to 1. 



25. 



* 3. That r = r 

xy Q z z 
1 ° . x y 



We want to show that when measures X and Y are converted to standard 
scores and correlated, the resulting correlation is the same as the correla- 
tion between the unstandardized (raw) measures of X and Y.Let us first 
rewrite the correlation formula for z scores: 



r 



(Z -Z «) (Z -Z ) 
, . i X . «x y. y 
n-1 i=l ,i 2 i 



z z 
x y 




Since Z = Z = 0, we can simplify to get: 
x y . * 



\ 



. 5" (Z ) (Z T 



n-1 i=l 



z 2 
x y 



\ 



n 

z 

v X 



i=I 1 



n -1 
z 
x 



z 



(Z ) 



Tr~=i 

z 



In the denominator, we recognize, that' z 



***** 



21 <z > 2 



= n- -1 and 

1=1 1 X 



n 

/ z 



ym 0 

(Z ) as n Substituting these values, we obtain: 

y 



:29 



4 * 
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i- X (z x M v 

—z— ' i * i 



- n-l i=l 



z z 
x y 



.n -1 
z 

X 



n -1 
z 

X 



-rf -1 

z 

Y 

n -1 
z 

y 



The denominator cancels out completely leaving: 



r = 1 ■ ZI Z z 

Z x Z y Vl i=1 XjL YjL 



(Recall that this relationship was used in the proof for the limits of 

• *** 

Now,# expanding the z score terms: 



r ) . 
xy 



9 r- 

n 



z z 
x y 



(X i -X) 



n-l i=l| ' (S ) (S ) 
x y 



This *is identical tcr 

- "7 



. /^TX.-X) (Y.-Y) 
1 ■ • f l l 

n-l i=l . 



(S ) (S- ) 
x ' x 



"A 



30 I 



-5^ 



27. 



p ■ 

Recognizing that 



S , we can write: 



n-1 i=l 



N 



2 2 
(S^(S^) 
x y 



Rewriting the denominator of the variance product term in 
terms (see Table 2)": 



raw score 



n j 



z z 
x y 



n-1 i=l 



£ ... ,.2 



i=l 



(X.-X) 
l 



n -1 
x 




This is precisely the form for r that was defined earlier in the paper 
Therefore, the correlation Jfetween measures in raw' score and z score 



forms is identical, 



28. 
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